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1.  INTRODUCTION 


The  unit  learning  curve  plays  a  prominent  role  in  DOD  cost  analysis.  In 
those  cases  when  the  model  accurately  describes  the  real-life  situation,  i.e., 
when  the  model  is  properly  applied  to  the  data,  it  can  be  a  powerful  tool  for 
predicting  unit  production  costs.  There  are,  however,  some  unique  estimation 
problems  inherent  in  the  model. 

The  usual  method  of  generating  predicted  unit  production  costs  attempts  to 
extend  properties  of  least  squares  estimators  to  non-linear  functions  of  these 
estimators.  The  result  is  biased  estimates  of  unit  production  costs.  Another 
problem  common  to  many  learning  curve  applications  is  estimating  lot  midpoints 
and  slope  coefficients  when  both  estimates  depend  on  each  other  and  both 
quantities  are  unknown. 

This  paper  addresses  the  two  problems  discussed  above  and  presents  an 
alternative  procedure  for  estimating  unit  learning  curves.  A  simple  modlfica- 

tion  to  the  usual  estimators  results  in  new  estimators  which  yield  unbiased 

\ 

estimates  of  unit  production  costs.  The  lot  midpoint  problem  is  overcome  by 

\ 

another  simple  and  widely  used  estimation  technique,  that  of  iterative 
ordinary  least  squares  (OLS) . 
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2.  BACKGROUND 


The  learning  curve  equation  ia  frequently  employed  in  DoD  cost  analysis. 
Although  there  are  several  variations  of  the  general  form  of  the  equation,  the 
one  considered  here  is  that  of  a  unit  learning  curve; 


y  ■  ax'5  (2.1) 


where  y  refers  to  the  cost  of  unit  x  of  a  specified  manufactured  item  and  a 
and  b  are  parameters  to  be  estimated.  Frequently  a  is  referred  to  as  the  cost 
of  unit  one  or  the  value,  since  when  x  »  1,  y  «*  a,  regardless  of  the  value 
of  b.  In  learning  curve  applications  b  is  a  negative  exponent  usually  ranging 
between  zero  and  one  in  absolute  value.  Hence,  as  the  number  of  units  in¬ 
creases,  the  unit  cost  will  decrease. 

The  stochastic^  model  corresponding  to  the  functional  model  (2.1)  is 
usually  assumed  to  be 

y  »  axbeu  (2.2) 


1.  The  model  (2.1)  is  a  mathematical  function  while  the  stochastic  model 
(2.2)  includes  the  disturbance  term.  Stochastic  in  this  case  is  meant  to 
imply  random. 

2.  The  letter  e  is  defined  to  be  based  of  the  natural  logorithms,  i.e.,  e  - 


2.71828  .  . 


which  Includes  a  multiplicative  function  of  the  disturbance  term  u.  This 
3 

error  term  is  assumed  to  be  a  well-behaved  random  variable  with  zero  mean  and 

2 

constant  variance  a  .  Thus,  y  has  a  lognormal  distribution.  (See  Appendix 
A). 


Given  these  results  and  a  fixed  value  (but  any  fixed  value)  of  the  explan- 

4 

atory  variable  x,  the  mean  or  expected  value  of  y  is 


E(y)  -  axV'3 4 5^  (2.3) 

the  median^  of  y  is 

M(y)  *  ax^  (2.4) 


3.  Disturbance  term  and  error  term  are  frequently  used  interchangeably  and 
refer  to  the  randomness  in  y  not  accounted  for  by  the  functional  form  of 
the  equation. 

4.  The  mean  or  expected  value  of  y  can  be  Interpreted  as  the  average  value  of 
y  observed  from  repeated  observation  on  the  same  value  of  x.  It  is 
usually  denoted  by  the  letter  E. 

5.  The  median  of  y  is  the  "middle  value"  of  y;  or  the  value  of  y  such  that 


half  of  the  observations  are  greater  in  value  than  y  and  half  are  less  in 
value.  It  is  usually  denoted  by  the  letter  M. 


Details  of  these  derivations  can  be  found  in  Goldberger  [1]^.  Mote  that  the 
mean  and  median  of  y  are  not  the  same.  This  is  due  to  the  fact  that  the 
lognormal  distribution  is  not  symmetric. ^  Recall  that  in  the  case  of  a  linear 
regression  function  with  additive  error  term  the  median  is  equal  to  the  mean, 
provided  the  error  term  is  normally  distributed. 

Since  equation  (2.2)  is  not  linear  in  the  parameters  a  and  b,  one  cannot 
estimate  these  parameters  by  simple  linear  regression.  The  usual  solution  to 
this  problem  is  to  proceed  as  follows: 

g 

1)  Transform  equation  (2.1)  by  taking  logs  to  yield 
In  y  *  In  a  +  b  In  x  (2.5) 

6.  A  detailed  and  rigorous  theoretical  account  of  much  of  the  underlying 
theory  upon  which  this  paper  is  based  can  be  found  in  a  paper  by  Arthur 
Goldberger  which  appeared  in  Econometrica  in  1968. 

7.  A  random  variable  which  has  a  symmetric  distribution  has  the  same  proba¬ 
bility  of  being  n  units  above  the  mean  as  it  does  of  being  n  units  below 
the  mean.  An  example  of  a  symmetric  distribution  is  the  normal  distribu¬ 
tion. 

8.  The  letters  In  are  meant  to  represent  the  natural  logorithm  of  the 
expression  following.  Natural  logorithms  are  used  exclusively  throughout 
this  paper. 


2)  Perform  ordinary  least  squares  on  the  linear  equation  (2.5)  to  obtain 


the  estimated  equation 

A  A  A 

In  y  ■  In  a  +  b  In  x  (2.6) 

where  the  "hats"  denote  OLS  estimates. 

o 

3)  Take  the  antilog  of  the  right-hand  side  of  equation  (2.6)  and  use  the 
resulting  equation  to  generate  estimates  of  y  In  the  original  unlogged 
equation,  viz.. 


where  w  *  ln^a  +  b  In  x. 

The  least  squares  estimates  obtained  in  step  2)  are  unbiased^  and  have 
the  minimum  variance  of  any  unbiased  estimators.  These  desirable  properties 
result  from  the  least  squares  estimation  technique;  see,  for  example.  Draper 
and  Smith  [2,  p.87].  It  is  Important  to  note  here  that  these  properties  apply 


9.  To  take  the  antilog  of  a  logorlthm  one  merely  raises  e  to  the  power  of  the 
logorithm.  Hence,  e^n  X  ■  x. 

10.  An  estimator  is  said  to  be  unbiased  if  Its  expected  value  is  equal  to 
the  parameter  It  Is  meant  to  estimate.  Thus,  E(x)  *  x  implies  that  x  Is 
an  unbiased  estimator  of  x. 


only  to  the  parameters  themselves  and  not  to  exponential  functions  of  these 
parameters.  Indeed,  because  of  the  convexity**  of  the  exponential  function, 

E(«")  -  e”  *  °'5  var  <»>  (2.8) 

-  a  xb  e  0-5”*°2  (2.9) 

where  m*a2  is  the  variance  of  a  predicted  y  value  for  any  given  value  of  x. 
Specifically,  this  variance  is  the  familiar  covariance  result, 

m*a2  *  var  (ltf'a)  +  (In  x)^  var  (b)  +  2  In  x  cov  (ltTa,  b). 

0 

Thus,  in  step  3) ,  e  is  not  an  unbiased  estimator  of  either  the  mean  or  the 
median  of  y.  (Compare  equation  (2.9)  with  equations  (2.3)  and  (2.4)). 


11.  The  expected  value  of  a  convex  function  of  a  random  variable  is  always 
greater  than  the  convex  function  of  the  expected  value  of  that  random 
variable,  i.e.,  if  x  is  a  random  variable  then  E(eX)  >  e 
et  al  [3,  p.72],  for  details. 


See  Mood 


3.  UNBIASED  ESTIMATORS 


It  is  clear  from  the  results  of  the  last  section  that  the  customary 

procedure  for  estimating  learning  curves  results  in  biased  estimates.  It 

0 

would  be  nice  if  we  could  modify  the  estimator  e  discussed  in  the  last 
section  in  such  a  manner  so  as  to  yield  unbiased  estimates  of  the  mean  and 
median  of  y.  Fortunately,  such  a  modification  has  been  developed  by 
Goldberger  [ 1 ] . 


Goldberger  has  developed  correction  factors  (See  appendix  B) ,  and  F. 


such  that 


E(Fm)  -  e 


-0.5m*a2 


(3.1) 


E(F  )  -  e  0*5°2  e  "°-5°*ct2  (3.2) 

fc 


* 

W 


The  products  of  the  estimator  e  and  the  correction  factors  result  in  unbiased 
estimators  of  the  mean  and  median  of  y,  i.e.. 


w_  .  w  0.5m* a2  -0.5m*a2 
E(e  FM)  -  e  e  e 


w  b 
e  “ax 


(3.3) 


E(eWFE) 


w  0.5m*o2  0.5a2 

e  e  e 


.  -O-S-*"2  (3.4) 


w 0.5 


Comparing  equations  (3.3)  and  (3. A)  with  equations  (2.3)  and  (2. A)  will  verify 
the  fact  that  these  estimators  are  unbiased. 

Although  the  correction  factors  and  involve  infinite  sums,  they 
converge  rather  quickly  to  some  preassigned  tolerance.  This  property  of  rapid 
convergence  and  the  use  of  digital  computers  make  these  estimators  a  desirable 
alternative  to  traditional  estimation  techniques. 

A.  LOT  MIDPOINTS 

e — \ 

Having  solved  the  problem  of  biased  estimators,  we  consider  another 
problem  associated  with  estimating  unit  learning  curves.  The  unit  learning 
curve  relates  unit  cost  or  labor  hours  to  the  number  of  items  produced.  DOD 
contractors,  however,  usually  account  for  costs  by  the  lot  rather  than  by  the 
unit.  Hence,  while  the  average  cost  of  a  lot  is  known,  the  quantity  associ¬ 
ated  with  it,  or  the  lot's  midpoint,  is  not.  Specifically,  the  midpoint  of  a 
lot  will  lie  somewhere  between  the  lot's  first  and  arithmetic  midpoint,  with 
the  exact  location  depending  on  the  curve's  slope.  An  unfortunate  dilemma 
therefore  emerges:  lot  midpoints  can't  be  computed  without  knowing  the  slope, 
and  the  slope  can't  be  estimated  without  knowing  the  lot  midpoints. 

In  an  attempt  to  resolve  this  perplexing  problem,  the  following  procedure 
is  suggested: 

1)  Estimate  the  slope  using  approximations  to  true  lot  midpoints. 


2)  Compute  new  lot  midpoints  based  on  the  previously  estimated  slope. 

3)  Repeat  steps  1)  and  2)  until  the  delta  between  successive  estimates  of 
the  slope  is  smaller  than  some  preassigned  tolerance. 

In  a  sampling  experiment,  Flynn  [5],  examined  the  properties  of  the  OLS 
estimator  of  a  unit  learning  curve  when  lot  midpoints  are  iteratively  estimat¬ 
ed. 

The  results  of  this  sampling  experiment  showed  that  mean  iterative  OLS 

values  were  always  very  close  to  mean  non-iterative  values  based  on  true  lot 

midpoints.  The  iterative  OLS  estimator  appeared  to  be  unbiased  in  the  samples 

examined.  The  estimator  is  not  without  problems,  however,  for  sometimes  it 

fails  to  converge.  (48  out  of  12,000  regression  equations  in  tbe  sampling 

2 

experiment)  Frequency  of  failure  seems  to  increase  as  R  decreases  and  as  lot 
quantities  increase. 

2 

Fortunately,  R  values  in  real-world  learning  curve  estimation  are  typi¬ 
cally  very  strong,  thus  minimizing  the  chance  of  non-convergence.  In  the  rare 
event  that  iterative  OLS  breaks  down,  common  rules  of  thumb  can  be  applied  to 
compute  lot  midpoints.  It  was  also  found  that  this  estimator's  iterative 
algorithm  usually  converges  to  four  decimal  places  after  only  three  to  five 
iterations. 


5.  AN  ALTERNATIVE  ESTIMATION  PROCEDURE 


Based  on  the  results  of  the  preceding  sections,  the  following  estimation 
procedure  is  suggested  as  a  desirable  alternative  to  the  customary  procedure. 

1)  Transform  equation  (2.1)  by  taking  logs  to  yield 

In  y  ■  In  a  +  b  In  x  (5.1) 

2)  Use  iterative  OLS  to  estimate  lot  midpoint  quantities  and  the  OLS 
estimates 

ln^ y  •  In's  +  b  In  x  (5.2) 

A 

*  W 


3)  To  predict  new  y  values  for  any  given  x  value  use  the  estimators 


A 


depending  on  whether  an  estimate  of  the  mean  or  median  is  wanted.  The 
resulting  estimates  are  unbiased  and  have  performed  consistently  in 
sampling  experiments. 


To  get  an  idea  of  the  magnitude  of  the  bias  introduced  by  the  customary 
estimation  procedure,  we  illustrate  a  sample  of  10  learning  curve  data  sets. 
These  data  sets  are  chosen  from  Navy  aircraft  and  missile  acquisition  pro¬ 
grams.  They  were  not  chosen  randomly,  but  instead,  were  chosen  to  indicate 
the  wide  range  of  bias  possible  using  the  usual  estimation  procedure. 

Figure  5.1  shows  estimated  percent  bias  for  median  predicted  T^  values  of 
the  10  samples  when  compared  to  the  unbiased  procedure  suggested  in  this 
paper.  Note  how  the  estimated  percent  bias  is  directly  related  to  the  vari¬ 
ance  of  the  predicted  y  values,  viz.,  m*  a2  .  In  this  case  the  predicted  y 
values  and  T^  values  are  the  same,  since  x  =  1. 

Sample 

Size  Variance  of  e  Estimated  %  Bias 


10 

.00275 

.14 

8 

.00429 

.21 

8 

.00454 

OO 

CVJ 

• 

8 

.01053 

.53 

3 

.01071 

.54 

10 

.01204 

.60 

14 

.08141 

4.17 

3 

.11253 

5.90 

5 

.30523 

17.07 

4 

.72146 

49.13 

Figure  5.1  -  Estimated  X  Bias  of  Median  T.  Values 


Flynn  [5]  and  Dagel  [4]  have  developed  computer  software  to  implement  the 
alternative  estimation  procedure  described  in  this  section.  The  FORTRAN-77 
code  is  relatively  compact  and  is  currently  being  run  on  a  VAX  computer.  It 
should  present  few  problems  to  adapt  this  code  to  personal  computers  such  as 


APPENDIX  A 


A  NOTE  ON  THE  MEAN  AND  VARIANCE  OF  A  LOGNORMALLY  DISTRIBUTED  RANDOM  VARIABLE. 

If  Y  is  a  positive  random  variable  and  if  we  define  a  new  random  variable 

X  *  In  Y 

with  X  having  a  normal  distribution,  i.e.  X~N(  y  ,  02  )  then 


has  a  lognormal  distribution.  The  probability  density  function  (p.d.f.)  of  X 
is 


fX  ^  ^  CTv/2lr  )  exP  )2/2  a2  ]. 

Hence  the  mean  of  Y  can  be  derived  in  a  straight  -  forward  manner  by  evaluat¬ 
ing  the  Integral 

E(y)  -  /e*  fx  (x)  dx  [3, pp.  176-177) 

•a* 

U  +  0.5  a2 

■  e 

2 

The  variance  of  Y  can  be  derived  in  a  simlliar  manner  by  evaluating  E(Y  )  and 
then  using  the  fact  that 


This  results  in 


var  (Y) 


e2M  +  2  a2 


Note  that  when  X  ~N(0,a2  )» 


E(Y) 


0.5  a2 
e 


var(Y) 


a2  +  2m 

e 

i.e.  when  m  ■  0, 


Details  of  these  derivations  can  be  found  in  Dagel  [4] 


APPENDIX  B 


The  exact  form  of  the  correction  factor  as  given  by  Goldberger  is 

(cw)-V^! 

where 

fj  -  (v/2)  j  r(v/2)  /r((v/2)  +  j  ]. 
and  where 

w  -  variance  of  the  estimator 
c  ■  a  constant 
v  =  degrees  of  freedom. 

The  correction  factor  for  the  median  of  y  is 

Fm(w:v,c) 


where 


rW 


The  correction  factor  for  the  mean  of  y  is 


Fe  (w:v,c) 

where 

w  *  0.5s^ 
c  ■  (1-m*) 

Details  of  these  derivations  can  be  found  in  Goldberger  [1]. 
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